Multinomial Distribution

For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories. For example, it models the probability of counts for rolling a k sided die n times.

Specification

f(x1,,xk;n,p1,,pk)=Pr(X1=x1 and  and Xk=xk)=n!x1!xk!px11pxkk,0when ki=1xi=notherwise,=Γ(ixi+1)iΓ(xi+1)i=1kpxii.

for non-negative integers x1,,xk.

Although it's imprecise, in many fields, especially NLP, categorical distribution is often confused with multinomial distribution.

Properties

Expectation

E(Xi)=npi

Covariance matrix

Each diagonal entry is the variance of a binomially distributed random variable, and is therefore

var(Xi)=npi(1pi)

The off-diagonal entries are the covariances:
cov(Xi,Xj)=npipj

for i, j distinct.

Reference

Multinomial distribution: https://en.wikipedia.org/wiki/Multinomial_distribution